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Abstract In this article we propose an algorithm, NESTA-LASSO, for the LASSO problem (i.e., an 
undcrdetermined linear least-squares problem with a one-norm constraint on the solution) that 
exhibits linear convergence under the restricted isometry property (rip) and some other reasonable 
assumptions. Inspired by the state-of-the-art sparse recovery method, nesta, we rely on an accel- 
erated proximal gradient method proposed by Nesterov in 1983 that takes 0(y/l/e) iterations to 
come within e > of the optimal value. We introduce a modification to Nesterov's method that 
regularly updates the prox-center in a provably optimal manner, resulting in the linear convergence 
of nesta-LASSO under reasonable assumptions. 

Our work is motivated by recent advances in solving the basis pursuit denoising (bpdn) problem 
(i.e., approximating the minimum one-norm solution to an undcrdetermined least squares problem). 
Thus, one of our goals is to show that nesta-LASSO can be used to solve the bpdn problem. We use 
nesta-LASSO to solve a subproblcm within the Pareto root-finding method used by the state-of- 
the-art bpdn solver SPGLl. The resulting algorithm is called PARNES, and we show, experimentally, 
that it is comparable to currently available solvers. 
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1 Introduction 

We would like to find a solution to the sparsest recovery problem with noise 

min ||a;||o s.t. \\Ax — b\\2 < cr. (1) 

Here, a specifies the noise level, A is an m-by-n matrix with m <^ n, and ||x||o is the number of 
nonzero entries of x. This problem comes up in fields such as image processing [25], seismics [23] 
|2"T] . astronomy 0, and model selection in regression [T5]. Since ([!]) is known to be ill-posed and 
NP-hard [18,23 , various convex, Zi-relaxed formulations are often used. 

Relaxing the zero-norm in ([!]) gives the basis pursuit denoising (bpdn) problem 

BP(ct) min||x||i s.t. || Ax — &|| 2 < a. (2) 

The special case of cr = is the basis pursuit problem |12j . Two other commonly used Zi-relaxations 
are the LASSO problem [30] 

ls(t) min \Ax - 6|| 2 s.t. ||x||i < r, (3) 

and the penalized least-squares problem 

Qp(A) min \Ax - &||| + A||x||i, (4) 

proposed by Chen, Donoho, and Saunders [12]. A large amount of work has been done to show 
that these formulations give an effective approximation of the solution to ([lj; see [TSlfHTHTO] . In 
particular, under certain conditions on the sparsity of the solution x, x can be recovered exactly 
provided that A satisfies the restricted isometry property (rip). 

There are a wide variety of algorithms which solve the BP(er), Qp(A), and ls(t) problems. Refer 
to Section[5]for descriptions of some of the current algorithms. Our work has been motivated by the 
accuracy and speed of the recent solvers NESTA and SPGLl. In [Mj, Nesterov presents an algorithm 
to minimize a smooth convex function over a convex set with an optimal convergence rate. An 
extension to the nonsmooth case is presented in [25] . nesta solves the bp(ct) problem using the 
nonsmooth version of Nesterov's work. 

For appropriate parameter choices of cr, A, and r, the solutions of BP(cr), QP(A), and Ls(r) 
coincide [55] . Although the exact dependence is usually hard to compute [33], there are solution 
methods which exploit these relationships. The Matlab solver SPGLl is based on the Pareto root- 
finding method [33] which solves BP(cr) by approximately solving a sequence of ls(t) problems. In 
SPGLl, the LS(t) problems are solved using a spectral projected-gradient (spg) method. 

While we are ultimately interested in solving the BPDN problem in our main result is an 
algorithm for solving the LASSO problem. Our algorithm, nesta-lasso (cf. Algorithm [2]) , effec- 
tively uses Nesterov's work to solve the LASSO problem. Additionally, we propose a modification 
of Nesterov's work which results in the local linear convergence of NESTA-LASSO under reasonable 
assumptions. Finally, we show that replacing the SPG method in the Pareto root-finding procedure, 
used in SPGLl, with our NESTA-LASSO method leads to an effective method for solving bp(ct). We 
call this modification PARNES and compare its efficacy with the state-of-the-art solvers presented 
in Section [5] 
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1.1 Organization of the paper 

In Section [2] we present and describe the background of NESTA-LASSO. We show in Section [3 that, 
under some reasonable assumptions, NESTA-LASSO can exhibit linear convergence. In Section 2J we 
describe the Pareto root-finding procedure behind the BPDN solver SPGLl and show how NESTA- 
LASSO can be used to solve a subproblem. Section 5 describes some of the available algorithms 
for solving bpdn and the equivalent Qp(A) problem. Lastly, in Section 6, we show in a series of 
numerical experiments that using nesta- LASSO in SPGLl to solve bpdn is comparable with current 
competitive solvers. 



2 NESTA-LASSO 



We present the main parts of our method to solve the LASSO problem. Our algorithm, NESTA-LASSO 
(cf. Algorithm [5]), is an application of the accelerated proximal gradient algorithm of Nesterov [2~3] 
outlined in Section |2.1 Additionally, we have a prox-center update improving convergence which 
we describe in Section 3j In each iteration, we use the fast ^-projector of Duchi, Shalev-Shwartz, 
Singer, and Chandra [H] given in Section 2.3 



2.1 Nesterov's Algorithm 

Let Q C R" be a convex closed set. Let / : Q — > K be smooth, convex and Lipschitz diffcrcntiablc 
with L as the Lipschitz constant of its gradient, i.e. 

||V/(x) - V/(y)|| <L\\x-y\\, for all x,y E Q. 

Nesterov's accelerated proximal gradient algorithm iteratively defines a sequence x k as a judiciously 
chosen convex combination of two other sequences y k and Zk, which are in turn solutions to two 
quadratic optimization problems on Q. The sequence z k involves a strongly convex prox-function, 
d(x), which satisfies 

d{x)>^\\x-c\\l (5) 

For simplicity, we have chosen the right hand side of ^ with a = 1 as our prox-function throughout 
this paper. The c in the prox-function is called the prox-center. With this prox-function, we have: 



y k = argminV/(x fc ) T (y - x k ) + -\\y - x k \\l, 




Nesterov showed that if x* is the optimal solution to 

mmf(x), 



4 



Ming Gu et al. 



then the iterates defined above satisfy 

f(y k ) -/(*•) <^h)IK -41 = o(A). (6) 

An implication is that the algorithm requires O(y^Ljs) iterations to bring f(yk) to within e > of 
the optimal value. 



Algorithm 1 Accelerated proximal gradient method for convex minimization 

Input: function /, gradient V/, Lipschitz constant L, prox-center c. 
Output: x* = argmin^gQ /(#) 

1: initialize xq\ 

2: for k = 0,1,2, ... , do 

3: compute f(xk) and V/(x^); 

4: = argmin agQ V/(x fc ) T (y - + jt\\y - x k \\%; 

5: z ft = argmin zeQ £*L ^[/OO + V/(xi) T (« - x t )] + f c|||; 

6: Xfe = fc+3 2fe + fc+S 4 " 1 ' 
7: end for 



In [25], Nesterov extends his work to minimize nonsmooth convex functions /. Nesterov shows 
that one can obtain the minimum by applying his algorithm for smooth minimization to a smooth 
approximation of /. Since V/ M is shown to have Lipschitz constant = if /i is chosen to 
be proportional to e, it takes O (i) iterations to bring f(xk) within e of the optimal value. 

The recent algorithm NESTA solves BP(<r) using Nesterov's algorithm for nonsmooth minimiza- 
tion. Our algorithm, NESTA-LASSO, solves ls(t) using Nesterov's smooth minimization algorithm. 
We are motivated by the accuracy and speed of nesta, and the fact that smooth version of Nes- 
terov's algorithm has a faster convergence rate than the nonsmooth version. 

2.2 NESTA-LASSO: An accelerated proximal gradient algorithm for LASSO 

We apply Nesterov's accelerated proximal gradient method in Algorithm [l] to the LASSO problem 
ls(t), and in each iteration, we use the fast ^-projector projj described in the next section. The 
pseudocode for this is given in Algorithm [2] We make one slight improvement to Algorithm [T] 
namely, we update our prox-centers from time to time. In fact, we will see in Section [3] that the 
prox-centers may be updated in an optimal fashion and that this leads to linear convergence under 
a suitable application of RIP (see Corollary [T| for details) . 

In our case, / = |||6 — Aa;|||, V/ = A*(Ax — b), and Q is the one-norm ball ||x||i < r. The 
initial point xq is used as the prox-center c. To compute the iterate y k , we have 

Vk = argminV/(a; fc ) T (y - x k ) + — \\y - x k \\l 

\\v\\i<r 2 

= argminy T y - 2(x k - Vf(x k )/L) T y 

\\v\\i<r 

= argmin||y - (x k - \7f(x k )/L)\\ 2 

\\v\\i<r 
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where proj 1 (w,r) returns the projection of the vector v onto the one-norm ball of radius r. By 
similar reasoning, computing Zk can be shown to be equivalent to computing 

Zk = proji [c-y V "—T^^fixi), T ) . 




Algorithm 2 NESTA-LASSO algorithm with prox-center updates 

Input: initial point xo, LASSO parameter t, tolerance rj. 
Output: a) T = argmin{||6 — Ax'||2 | \\x\\i <">"}• 

1: for j = 0, . . . , jmax, do 

2: Cj = xo, h = 0, r = b - Ax , g = -A T r , r) = \\r \\ 2 - (b T r - t"|| 9o II oo )/ 1| r- 1| 2 ; 

3: for k = 0, . . . , fcmax, do 

4: if < e~ 2 t)q then 

5: return y k ,r) k 

6: end if 

7: y k = proj^Xfc - g k /L, r); 

8: h k = h k + ^^g k ; 

9: 2 fe = proj^Cj - h k /L, r); 

10: x fc = ^z k + ^|j/ fc ; 

11: r k = b — Ax k ; 

12: 9fc = -A T r fe ; 

13: Vk = \\r k h - (b T r k - r\\g k \\oo)/\\rk\\r, 

14: end for 

15: x = y k ; 

16: if rj k < rj then 

17: return x T = y k \ 

18: end if 

19: end for 



2.3 One Projector 

The one-projector, proj 1; is used twice in each iteration of Algorithm [2j We briefly describe the 
algorithm of Duchi, Shalev-Schwartz, Singer, and Chandra [13] for fast projection to an ^-ball 
in high-dimension. A similar algorithm is presented in |33) . The algorithm costs 0(n log n) in the 
worst case, but it has been shown to cost much less experimentally [33]. Note that the two calls 
to one-projector in each iteration can be reduced to one call with results in Tseng's paper ^32 . 
However, we make no change to our algorithm due to the low cost of the one- project or. 

Consider the projection of an n-vector c onto the one-norm ball ||a:||i < r. This is given by the 
minimization problem 

proj 1 (c, r) :— argmin||c — a; || 2 s.t. ||x||i<r. (7) 

X 

Without loss of generality, the symmetry of the one-norm ball allows us to assume that c has all 
nonnegative entries. Assuming the coefficients of the vector c are ordered from largest to smallest, 
the solution x* = proj 1 (c, r) is given by 

rn i r-(cH hc fc ) 

x i = max{0, Ci — rj} with rj = (8) 
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where k is the largest index such that rj < Ck- Please refer to 33 ; 14J for more details on the 
implementation. 



3 Optimality 

In NESTA-LASSO, we update the prox-center, c, in a provably optimal way which yields a local 
linear convergence rate. We analyze the case where the solution x* is s-sparse. The case where x* 
is approximately sparse has not been analyzed; we plan to do this for future work. Nesterov has 
shown that when his accelerated proximal gradient algorithm is applied to minimize the objective 
function /, the fcth iterate yk and the minimizer x* satisfy 

f(y k )-f(x*)<^-^\\x*-cf 2 (9) 

where L is the Lipschitz constant for V/ and c is the prox-center [2M25]. 

In our case, f(x) = \\Ax — b\\\ where the undcrdetermined matrix A is understood to represent 
a dictionary of atoms, usually sampled via some measurements. We will assume that A satisfies the 
restricted isometry property (rip) of order 2s as described in [HUH]. Namely, there exists a constant 
S 2s € (0, 1) such that 

(l-S 2s )\\xf 2 <\\Axf 2 <(l + 5 2s )\\x\\ 2 2 (10) 

whenever ||x||o < 2s. Since the RIP helps ensure that the solution to (JlJ is closely approximated 
by the solution to ^ [9], and we are ultimately interested in solving |2]), this is a reasonable 
assumption. 

Additionally, we assume the iterates yk are s-sparse when sufficiently close to x* . This is rea- 
sonable since each iteration of NESTA-LASSO involves projecting the current iteration onto a 1-norm 
ball. Due to the geometry of the projection, there is a high likelihood that our assumption will 
hold. Under the RIP and the assumption that x* is s-sparse, it follows by Theorem 3.2 in j^Hj that 
the LASSO problem has a unique solution. Since the one-norm ball is compact, this implies that yk 
converges to x* . 

Let 8 = 1 — S 2s . We have 

\\A(x* - y k )\\l + 2(y k - x*) T A T (Ax* -b) = f(y k ) - f(x*) > \\A(y k - x*)\\\ > 5\\y k - x*\\. 

To see the first inequality, let y — x* +r(j/fc — x*) for r G [0, 1]. Due to the convexity of the one-norm 
ball, y is feasible. Since x* is the minimum, for any r € [0, 1], 

f(y) - fix*) = t 2 \\A(x* - y k )\\l + 2r{y k - x*) T A T (Ax* - b) > 0. 

Thus, (y k - x*) T A T (Ax* -b)>0. The second inequality follows from ((lOj. Then from (J9]> , we have 



Putting everything together gives 
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The above relation and ([9| suggest that we can speed up Algorithm [I] by updating the prox-center, 
c, with the current iterate yu every k steps. With our assumptions, we prove in the following theorem 
that there is an optimal number of such steps. Allow the iterates to be represented by yjk where j 
is the number of times the prox-center has been changed (the outer iteration) and k is number of 
iterations after the last prox-center change (the inner iteration). 

Theorem 1 Suppose A satisfies the restricted isometry property of order 2s, the solution x* is 
s-sparse, and the iterates yjk are eventually s-sparse. Then for each j , 

11%-fcopt ~x*\\< l/e||%i -at*|| 

where 

T 



k o P t = e\jj (12) 

and e is the base of the natural logarithm. Moreover, the total number of iterations, Jtot&opt* to get 
\\Ujk ~~ x *\\2 < £ is minimized with this choice of k opt - 



Proof First observe that (11) implies 



when 




2 < e\\Vn 



This relation allows us to choose k to minimize the product jk. Since 

k log e 

jk = 



log \/L/5 — logfc 

taking derivative of the expression on the right shows that jk is minimized when 

^opt = 6 

where e is the base of the natural logarithm. Thus, 

jtotfcopt = -ey^loge. 

The assumption that the iterates yjk will eventually be s-sparse is reasonable precisely because 
we expect the LASSO relaxation ([3| to be a good proxy for In other words, we expect our 
solutions to ([3]) to be sparse. Of course this argument is merely meant to be a heuristic; a more 
rigorous justification along the lines of 1 1 3 j may be possible and is in our future plans. 



Let fc opt be as in (12 1. For each prox-center change, j, perform fc opt inner iterations, and let 
Pj ~ Vjkapt t" e the output to the jth outer iteration. An immediate consequence of Theorem [T] is 
that the relative decrease after fc opt steps of the inner iteration in Algorithm [2] is e -2 , i.e. 



\\Pj -x*\\t < e 2 \\Pj-i ~x*\\l 



and in general, this is the best possible. 
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Table 1 Number of products, Na, with A and A* for NESTA-LASSO without prox-center updates 
Number of Rows of A Number of Columns of A r Nj\ 

100 256 6.28 69 

200 512 12.6 77 

400 1024 25.1 157 



Table 2 Number of products, Na, with A and A* for NESTA-LASSO with prox-center updates 
Number of Rows of A Number of Columns of A r Na 

100 256 6.28 37 

200 512 12.6 47 

400 1024 25.1 45 



Corollary 1 If A satisfies the restricted isometry property of order 2s, the solution x* is s-sparse, 
and the iterates pj are eventually s-sparse, then Algorithm^is linearly convergent. 

In our experiments, there are some cases where updating the prox-center will eventually cause 
the duality gap to jump to a higher value than the previous iteration. This can cause the algorithm 
to run for more iterations than necessary A check is added to prevent the prox-center from being 
updated if it no longer helps. 

In Tables [l] and [2j we give some results showing that updating the prox-center is effective when 
using NESTA-LASSO to solve the LASSO problem. 



4 PARNES 

In applications where the noise level of the problem is approximately known, it is preferable to 
solve bp(ct). The Pareto root-finding method used by van den Berg and Friedlander |33] interprets 
bp(ct) as finding the root of a single-variable nonlinear equation whose graph is called the Pareto 
curve. Their implementation of this approach is called SPGLl. In SPGLl, an inexact version of 
Newton's method is used to find the root, and at each iteration, an approximate solution to the 
LASSO problem, ls(t), is found using an SPG approach. Refer to for more information on the 
inexact Newton method. In Section [6j we show experimentally that using NESTA-LASSO in place 
of the SPG approach can lead to improved results. We call this version of the Pareto root-finding 
method, parnes. The pseudocode of PARNES is given in Algorithm [3] 

4.1 Pareto Curve 

Suppose A and b are given, with ^ b £ range(^4). The points on the Pareto curve are given by 
(r, f(r)) where </?(t) = \\Ax T — 6|| 2 , t = Hx,-^, and x T solves ls(t). The Pareto curve gives the 
optimal trade-off between the 2-norm of the residual and 1-norm of the solution to ls(t). It can 
also be shown that the Pareto curve also characterizes the optimal trade-off between the 2-norm of 
the residual and 1-norm of the solution to BP (a). Refer to [33] for a more detailed explanation of 
these properties of the Pareto curve. 
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Fig. 1 An example of a Pareto curve. The solid line is the Pareto curve; the dotted red lines give two iterations of 
Newton's method. 



Let t bp be the optimal objective value of BP(0). The Pareto curve is restricted to the interval 
r e [0, r BP ] with ip(0) = \\b\\ 2 > and </?(r BP ) = 0. The following theorem, proved in [33], shows 
that the Pareto curve is convex, strictly decreasing over the interval r g [0,t bp ], and continuously 
differentiable for r £ (0, r BP ). 

Theorem 2 The function <p is 

1. convex and nonincreasing. 

2. continuously differentiable for t G (0,t bp ) with <p'{t) = — X T where X T = \\A T y T \\ 00 is the optimal 
dual variable to LS(r) and y T = r T /||r r ||2 with r T = Ax T — b. 

3. strictly decreasing and ||x r ||i = r for r € [0,t bp ]. 



4.2 Root Finding 

Since the Pareto curve characterizes the optimal trade-off for both BP(ct) and Ls(r), solving bp(ct) 
for a fixed a can be interpreted as finding a root of the non-linear equation ip(r) = a. The iterations 
consist of finding the solution to ls(t) for a sequence of parameters —¥ r CT where r a is the optimal 
objective value of bp(<t). 

Applying Newton's method to ip gives 

Tfc+i = r fc + (a - ip(T k ))/cp'(r k ). 

Since ip is convex, strictly decreasing and continuously differentiable, — > r a superlinearly for all 
initial values To € (0,r BP ) (see Proposition 1.4.1 in [4 ). By Theorem[2j f(rk) is the optimal value to 
LS(rfc) and <p'( T fe) is the dual solution to LS(rfc). Since evaluating ip(rk) involves solving a potentially 
large optimization problem, an inexact Newton method is carried out with approximations of ^( T fc) 
and <p'(T k ). 
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Let y T and A r be the approximations of the y T and A r defined in Theorem |2j The duality gap 
at each iteration is given by 

Vr = \\f T \\ 2 - {b T y T - tA t ). 
The authors of |33j have proved the following convergence result. 

Theorem 3 Suppose A has full rank, a G (0, ||6|L), and the inexact Newton method generates a 
sequence T k — > r CT . If rjk '■= rj Tk — > and tq is close enough to T a , we have 

- r a \ = 71% + Ck\n -r a \, 

where (k — > and 71 is a positive constant. 
4.3 Solving the LASSO problem 

Approximating y(rfe) and f'ijk) require approximately minimizing ls(t). The solver SPGLl uses a 
spectral projected-gradient (spg) algorithm. The method follows the algorithm by Birgin, Martinez, 
and Raydan [Fj and is shown to be globally convergent. The costs include evaluating Ax, A T r, and 
a projection onto the one-norm ball x < r. In parnes, we replace this SPG algorithm with our 
algorithm, NESTA-LASSO (cf. Algorithm [f]) . 



Algorithm 3 parnes: Pareto curve method with NESTA-LASSO 

Input: initial point xq, bpdn parameter <r, tolerance rj. 
Output: 1, = argmin{||a;|| 1 | \\Ax — fo 1 1 2 < cr} 

1: T = 0, <p = U&II2, f'o = ll^ T 6|[cx>; 

2: for k = 0, . . . , fcmax, do 

3: r fc+1 = r fe + (a - tp k )/ip' k ; 

4: x k+1 = NESTA-LASSO(xfe,Tfe + i,»;); 

5: r k+1 = b- Ax k+1 ; 

6: tp k+1 = ||rfe +1 || 2 ; 

7: v'fc+i = -ll^^+illoo/lk'fc+ilh; 

8: if ||rfe +1 || 2 - a < r) ■ max{l, ||rfe +1 || 2 } then 

9: return x a = x k+ i; 

10: end if 
11: end for 



5 Other solution techniques and tools 

In the NESTA paper, [3], extensive experiments are carried out, comparing the effectiveness of state- 
of-the-art sparse reconstruction algorithms. The code used to run these experiments is available at 
http://www.acm.caltech.edu/~nesta. We have modified this NESTA experiment infrastructure 
to include PARNES in its tests and repeated some of the tests in [3] with the same experimental 
standards and parameters. Refer to the [3] for a detailed description of the experiments. The 
algorithms tested and their experimental details are described below. Note that the algorithms 
either solve BP(er) or Qp(A). 
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5.1 NESTA [3J 

NESTA is used to solve bp(ct). Its code is available at http://www.acm.caltech.edu/~nesta 

The parameters for nesta are set to be 

x = A*b, fi = 0.02, 

where xq is the initial guess and \x is the smoothing parameter for the one-norm function in bp(ct). 

Continuation techniques are used to speed up NESTA in [3j. Such techniques are useful when 
it is observed that a problem involving some parameter A is faster for large A, [271H9]. Thus, the 
idea of continuation is to solve a sequence of problems for decreasing values of A. In the case of 
nesta, it is observed that convergence is faster for larger values of /i. When continuation is used 
in the experiments, there are four continuation steps with = ||£o||oo and /it = (Wa'cO^'Vo for 
* = 1,2,3,4. 



5.2 GPSR: Gradient Projection for Sparse Reconstruction [16] 



GPSR is used to solve the penalized least-squares problem Qp(A). The code is available at http: 
|//www. lx. it .pt7 ~mtf /GPSR The problem is first recast as a bound-constrained quadratic program 
(bcqp) by using a standard change of variables on x. Here, x = u\ — U2, and the variables are now 
given by [ttijUa] where the entries are positive. The new problem is then solved using a gradient 
projection (gp) algorithm. The parameters are set to the default values in the following experiments. 

A version of GPSR with continuation is also tested. The number of continuation steps is 40, the 
variable tolerancea is set to 10~ 3 , and the variable minitera is set to 1. All other parameters 
are set to their default values. 



5.3 SpaRSA: Sparse reconstruction by separable approximation [17] 

SPARSA is used to minimize functions of the form 4>{x) — f(x) + Xc(x) where / is smooth and c 
is non-smooth and non-convex. The Qp(A) problem is a special case of functions of this form. The 
code for SPARSA is available at |http : //www . lx . it . pt/~mt f /SpaRSA 

In a sense, SPARSA is an iterative shrinkage/thresholding algorithm. Utilizing continuation and 
a Brazilai-Borwein heuristic pQ to find step sizes, the speed of the algorithm can be increased. The 
number of continuation steps is set to 40 and the variable minitera is set to 1. All remaining 
variables are set to their default values. 



5.4 SPGL1 [33J and SPARCO [31] 

SPGL1 is available at http: //www. cs .ubc . ca/labs/scl/spgll The parameters for our numerical 



experiments are set to their default values. 

Due to the vast number of available and upcoming algorithms for sparse reconstruction, the 
authors of SPGLl and others have created SPARCO |34j . In SPARCO, they provide a much needed 
testing framework for benchmarking algorithms. It consists of a large collection of imaging, com- 
pressed sensing, and geophysics problems. Moreover, it includes a library of standard operators 
which can be used to create new test problems. SPARCO is implemented in Matlab and was origi- 



nally created to test SPGLl. The toolbox is available at http://www.cs.ubc.ca/labs/scl/sparco 
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5.5 FISTA: Fast iterative soft-thresholding algorithm 2 
fista solves Qp(A). It can be thought of as a simplified version of the Nesterov algorithm in Sec- 



tion 



2.1 since it involves two sequences of iterates instead of three. In Section 4.2 of [3J, fista is 



shown to give very accurate solutions provided enough iterations are taken. Due to its ease of use 
and accuracy, fista is used to compute reference solutions in [3J and in this paper. The code for 
fista can be found in the nesta experiments code at http://www.acm.caltech.edu/~nesta. 



5.6 FPC: Fixed point continuation [191.120] 

fpc solves the general problem min^ H^Hj + A/(.t) where f(x) is differentiable and convex. The 
special case with f(x) = | ||^4cc — b\f 2 is the QP(A) problem. The algorithm is available at |http :] 
//www. caam.rice . edu/~op t imization/Ll/f pc[ 

FPC is equivalent to iterative soft-thresholding. The approach is based on the observation that 
the solution solves a fixed-point equation x — F{x) where the operator F is a composition of a 
gradient descent-like operator and a shrinkage operator. It can be shown that the algorithm has 
q-linear convergence and also, finite-convergence for some components of the solution. Since the 
parameter A affects the speed of convergence, continuation techniques arc used to slowly decrease A 
for faster convergence. A more recent version of fpc, fpc-BB, uses Brazilai-Borwein steps to speed 
up convergence. Both versions of FPC are tested with their default parameters. 



5.7 FPC-AS: Fixed-point continuation and active set [35] 

fpc- AS is an extension of FPC into a two-stage algorithm which solves QP(A). The code can be found 
at http://www.caam.rice.edu/~optimization/Ll/fpc, It has been shown in [19 that applying 
the shrinkage operator a finite number of times yields the support and signs of the optimal solution. 
Thus, the first stage of FPC-AS involves applying the shrinkage operator until an active set is 
determined. In the second stage, the objective function is restricted to the active set and \\x\\^ is 
replaced by c T x where c is the vector of signs of the active set. The constraint Cj • xi > is also 
added. Since the objective function is now smooth, many available methods can now be used to 
solve the problem. In the following tests, the solvers L-BFGS and conjugate gradients, CG (referred 
to as FPC-AS (cg)), are used. Continuation methods are used to decrease A to increase speed. 
For experiments involving approximately sparse signals, the parameter controlling the estimated 
number of nonzeros is set to n, and the maximum number of subspace iterations is set to 10. The 
other parameters are set to their default values. All other experiments were tested with the default 
parameters. 



5.8 Bregman [36] 

The Bregman Iterative algorithm consists of solving a sequence of Qp(A) problems for a fixed A 
and updated observation vectors b. Each QP(A) is solved using the Brazilai-Borwein version of fpc. 
Typically, very few (around four) outer iterations are needed. Code for the Bregman algorithm can 
be found at http://www.caam.rice.edu/~optimization/Ll/2006/10/bregman-iter 



ative-algorithms-f or .html| All parameters are set to their default values. 
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5.9 SALSA [MEZ] 

There is a new state-of-the-art method solving BP (a) called SALSA which has been shown to be 
competitive with SPGLl and NESTA. The experimental results in our paper do not include SALSA, 



but we hope to make comparisons in future work. The code for SALSA can be found at http: 



// cascais . lx. it . pt/~maf onso/salsa.html 



6 Numerical results 



As mentioned above, some of the algorithms we test solve QP(A) and some solve bp (a). Comparing 
the algorithms thus requires a way of finding a (a, A) pair which for which the solutions coincide. 
The tests in [3] use a two-step procedure. From the noise level, a is chosen, and then SPGLl is used 
to solve BP(cr). The SPGLl dual solution gives an estimate of the corresponding A and then fista 
is used to compute a second a corresponding to this A with high accuracy. 

In the NESTA experiments, fista is used to determine the accuracy of the computed solutions 
while nesta is used to compute a solution that is used in the stopping criteria. Please refer to [3] 
for a complete description of the experimental details. Section 4.2 of [3J, shows that fista gives very 
accurate solutions provided enough iterations are taken. For each test, fista is run twice, fista is 
first run, with no limit on the number of iterations, until the relative change in the function value is 
less than 1CP 14 . This solution is used to determine the accuracy of the computed solutions, fista is 
ran a second time with one of the following stopping criterion; the results of this run are recorded 
in the tables. 

In [3J, NESTA (with continuation), is used to compute a solution £nes- In the tests, the following 
stopping criteria are used where is the fcth-iteration in the algorithm being tested. 



\\xk\W < ||xnes|Ui and ||6 - Ax k \\t 2 < 1.05 1|6 - Ac NE s||^ 2 , (13) 

or 

In other words, the algorithms are run until they achieve a solution at least as accurate as NESTA. 
The rationale for having two stopping criteria is to reduce any potential bias arising from the 



fact that some algorithms solve Qp(A), for which (14) is the most natural, while others solve BP(er) 



for which ( 13 1 is the most natural. It is evident from the tables below that there is not a significant 



difference whether ( [13] ) or (14) is used. The algorithms are recorded to not have converged (dnc) 
if the number of calls to A or A* exceeds 20,000. 

In Tables [4] and [5] we repeat the experiments done in Tables 5.1 and 5.2 of [3]. These experiments 
involve recovering an unknown signal that is exactly s-sparse with n — 262144, m — n/8, and 
s = m/5. The experiments are performed with increasing values of the dynamic range d where 
d = 20,40,60,80,100 dB. For each run, the measurement operator is a randomly subsampled 
discrete cosine transform, and the noise level is set to 0.1. 

The dynamic range, d, is a measure of the ratio between the largest and smallest magnitudes of 
the non-zero coefficients of the unknown signal. Problems with a high dynamic range occur often in 
applications. In these cases, high accuracy becomes important since one must be able to detect and 
recover lower-power signals with small amplitudes which may be obscured by high-power signals 
with large amplitudes. 
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Table 3 Comparison of accuracy using experiments from Table UJ Dynamic range 100 dB, <r = 0.100, fi = 0.020, 



sparsity level s 


= m/5. Stopping rule is 


<nu> 








Methods 


N A 


IMIi 


||Az-&|| 2 


lias-it* ||i 

Ha:* I1 1 
II"* 111 


|[& X 1 1 oo 


\\x - x*[|a 


PARNES 


632 


942197.606 


2.692 


0.000693 


8.312 


46.623 


NESTA 


15227 


942402.960 


2.661 


0.004124 


45.753 


255.778 


NESTA + CT 


787 


942211.581 


2.661 


0.000812 


9.317 


52.729 


GPSR 


DNC 


DNC 


DNC 


DNC 


DNC 


DNC 


GPSR + CT 


11737 


942211.377 


2.725 


0.001420 


15.646 


90.493 


SPARSA 


693 


942197.785 


2.728 


0.000783 


9.094 


51.839 


SPGLl 


504 


942211.520 


2.628 


0.001326 


14.806 


84.560 


FISTA 


12462 


942211.540 


2.654 


0.000363 


4.358 


26.014 


FPC-AS 


287 


942210.925 


2.498 


0.000672 


9.374 


45.071 


FPC-AS (CG) 


361 


942210.512 


2.508 


0.000671 


9.361 


45.010 


FPC 


9614 


942211.540 


2.719 


0.001422 


15.752 


90.665 


FPC-BB 


1082 


942209.854 


2.726 


0.001378 


15.271 


87.963 


BREGMAN-BB 


1408 


942286.656 


1.326 


0.000891 


9.303 


52.449 



Table 4 Number of function calls where the sparsity level is s = m/5 and the stopping rule is <| 1 3 1 



Method 


20 dB 


40 dB 


60 dB 


80 dB 


100 dB 


PARNES 


122 


172 


214 


470 


632 


NESTA 


383 


809 


1639 


4341 


15227 


NESTA + CT 


483 


513 


583 


685 


787 


GPSR 


61 


622 


5030 


DNC 


DNC 


GPSR + CT 


271 


219 


357 


1219 


11737 


SPARSA 


323 


387 


465 


541 


693 


SPGLl 


58 


102 


191 


374 


504 


FISTA 


69 


267 


1020 


3465 


12462 


FPC-AS 


209 


231 


299 


371 


287 


FPC-AS (CG) 


253 


289 


375 


481 


361 


FPC 


474 


386 


478 


1068 


9614 


FPC-BB 


164 


168 


206 


278 


1082 


BREGMAN-BB 


211 


223 


309 


455 


1408 



The last two tables, Tables [6] and [7j replicate Tables 5.3 and 5.4 of [3]. There are five runs of 
each experiment. Each run involves an approximately sparse signals obtained from a permutation 
of the Haar wavelet coefficients of a 512 x 512 image. The measurement vector b consists of m = 
n/8 = 512 2 /8 = 32, 768 random discrete cosine measurements, and the noise level is set to 0.1. For 
more specific details, refer to [3]. In applications, the signal to be recovered is often approximately 
sparse rather than exactly sparse. Again, high accuracy is important when solving these problems. 

It can be seen that nesta + CT, SPARSA, SPGLl, parnes, and both versions of FPC-AS perform 
well in the case of exactly sparse signals for all values of the dynamic range. However, in the case of 
approximately sparse signals, all versions of FPC and SPARSA no longer converge in 20,000 function 
calls. PARNES still performs well, converging in under 2000 iterations for all runs. The accuracy of 
the various algorithms is compared in Table [3| 
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Table 5 Number of function calls where the sparsity level is s = m/5 and the stopping rule is \14\ 



Method 


20 dB 


40 dB 


60 dB 


80 dB 


100 dB 


PARNES 


74 


116 


166 


364 


562 


NESTA 


383 


809 


1639 


4341 


15227 


NESTA + CT 


483 


513 


583 


685 


787 


GPSR 


62 


618 


5026 


DNC 


DNC 


GPSR + CT 


271 


219 


369 


1237 


11775 


SPARSA 


323 


387 


463 


541 


689 


SPGLl 


43 


99 


185 


365 


488 


FISTA 


72 


261 


1002 


3477 


12462 


FPC-AS 


115 


167 


159 


371 


281 


FPC-AS (CG) 


142 


210 


198 


481 


355 


FPC 


472 


386 


466 


1144 


9734 


FPC-BB 


164 


164 


202 


276 


1092 


BREGMAN-BB 


211 


223 


309 


455 


1408 



Table 6 Recovery results of an approximately sparse signal (with Gaussian noise of variance 1 added) and with l |14| 
as a stopping rule. 



Method 


Run 1 


Run 2 


Run 3 


Run 4 


Run 5 


PARNES 


838 


810 


1038 


1098 


654 


NESTA 


8817 


10867 


9887 


9093 


11211 


NESTA + CT 


3807 


3045 


3047 


3225 


2735 


GPSR 


DNC 


DNC 


DNC 


DNC 


DNC 


GPSR + CT 


DNC 


DNC 


DNC 


DNC 


DNC 


SPARSA 


2143 


2353 


1977 


1613 


DNC 


SPGLl 


916 


892 


1115 


1437 


938 


FISTA 


3375 


2940 


2748 


2538 


3855 


FPC-AS 


DNC 


DNC 


DNC 


DNC 


DNC 


FPC-AS (CG) 


DNC 


DNC 


DNC 


DNC 


DNC 


FPC 


DNC 


DNC 


DNC 


DNC 


DNC 


FPC-BB 


5614 


7906 


5986 


4652 


6906 


BREGMAN-BB 


3288 


1281 


1507 


2892 


3104 



Tab le 7 Recovery results of an approximately sparse signal (with Gaussian noise of variance 0.1 added) and with 
| |14[ l as a stopping rule. 



Method 


Run 1 


Run 2 


Run 3 


Run 4 


Run 5 


PARNES 


1420 


1772 


1246 


1008 


978 


NESTA 


11573 


10457 


10705 


8807 


13795 


NESTA + CT 


7543 


13655 


11515 


3123 


2777 


GPSR 


DNC 


DNC 


DNC 


DNC 


DNC 


GPSR + CT 


DNC 


DNC 


DNC 


DNC 


DNC 


SPARSA 


12509 


DNC 


DNC 


3117 


DNC 


SPGLl 


1652 


1955 


2151 


1311 


2365 


FISTA 


10845 


12165 


10050 


7647 


11997 


FPC-AS 


DNC 


DNC 


DNC 


DNC 


DNC 


FPC-AS (CG) 


DNC 


DNC 


DNC 


DNC 


DNC 


FPC 


DNC 


DNC 


DNC 


DNC 


DNC 


FPC-BB 


DNC 


DNC 


DNC 


DNC 


DNC 


BREGMAN-BB 


3900 


3684 


2045 


3292 


3486 
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6.1 Choice of parameters 

As Tseng observed, accelerated proximal gradient algorithms will converge so long as the condition 
given as equation (45) in [32] is satisfied. In our case this translates into 



mm 



Vf(y k ) T x + ~\\x - x k \\l + P{x) \ > Wf(y k ) T y k + P(y k ), (15) 



upon setting "f k = 1 and 



P(x) = 



[0 if||x||i<T, 

I oo otherwise, 



in (45) in |32| . In other words, the value of L need not necessarily be fixed at the Lipschitz constant 
of V/ but may be decreased and decreasing L has the same effect as increasing the stepsize. Tseng 
suggested to decrease L adaptively by a constant factor until (45) is violated then backtrack and 
repeat the iteration (cf. Note 6 in [35]). For simplicity, and very likely at the expense of speed, we 
do not change our L adaptively in parnes and NESTA-LASSO. Instead, we choose a small fixed L 



by trying a few different values so that (151 is satisfied for all k, and likewise for the tolerance rj in 
Algorithm [2] However, even with this crude way of selecting L and rj, the results obtained are still 
rather encouraging. 



7 Conclusions 

As seen in the numerical results, SPGLl and nesta are among some of the top performing solvers 
available for basis pursuit denoising problems. We have therefore made use of Nesterov's accelerated 
proximal gradient method in our algorithm NESTA-LASSO and shown that updating the prox-center 
leads to improved results. Through our experiments, we have shown that using NESTA-LASSO in the 
Pareto root-finding method leads to results comparable to the results given by currently available 
state-of-the-art methods. 
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